89 research outputs found

    Simulation experiments for similarity indexes between two hierarchical clusterings

    Get PDF
    Morlini and Zani (2012) have proposed a new dissimilarity indexfor comparing two hierarchical clusterings on the basis of thewhole dendrograms. They have presented and discussed its basicproperties and have shown that the index can be decomposed intocontributions pertaining to each stage of the hierarchies. Then,they have obtained a similarity index SS as the complement to oneof the suggested distance and have shown that its singlecomponents SkS_k obtained at each stage kk of the hierarchies canbe related to the measure BkB_k suggested by Fowlkes \& Mallows(1983) and to the Rand index RkR_k. In this paper, we reportresults of a series of simulation experiments aimed at comparingthe behavior of these new indexes with other well-establishedsimilarity measures, over different experimental conditions. Thefirst set of simulations is aimed at determining the behavior ofthe indexes when the clusterings being compared are unrelated. Thesecond set tries to investigate the robustness to different levelsof nois

    On multicollinearity and concurvity in some nonlinear multivariate models

    Get PDF
    Recent developments of multivariate smoothing methods provide a rich collection of feasible models for nonparametric multivariate data analysis. Among the most interpretable are those with smoothed additive terms. Construction of various methods and algorithms for computing the models have been the main concern in literature in this area. Less results are available on the validation of computed fit, instead, and many applications of nonparametric methods end up in computing and comparing the generalized validation error or related indexes. This article reviews the behavior of some of the best known multivariate nonparametric methods, based on subset selection and on projection, when (exact) collinearity or multicollinearity (near collinearity) is present in the input matrix. It shows the possible aliasing effects in computed fits of some selection methods and explores the properties of the projection spaces reached by projection methods in order to help data analysts to select the best model in case of ill conditioned input matrices. Two simulation studies and a real data set application are presented to illustrate further the effects of collinearity or multicollinearity in the fit

    Radial basis function networks with partially classified data

    Get PDF
    The problem of estimating a classification rule with partially classified observations, which often occurs in biological and ecological modelling, and which is of major interest in pattern recognition, is discussed. Radial basis function networks for classification problems are presented and compared with the discriminant analysis with partially classified data, in situations where some observations in the training set are unclassified. An application on a set of morphometric data obtained from the skulls of 288 specimens of Microtus subterraneus and Microtus multiplex is performed. This example illustrates how the use of both classified and unclassified observations in the estimate of the hidden layer parameters has the potential to greatly improve the network performances

    Variable selection in cluster analysis: an approach based on a new index

    Get PDF
    In cluster analysis, the inclusion of unnecessaryvariables may mask the true group structure. For the selection ofthe best subset of variables, we suggest the use of two overallindices. The first index is a distance between two hierarchicalclusterings and the second one is a similarity index obtained asthe complement to one of the previous distance. Both criteria canbe used for measuring the similarity between clusterings obtainedwith different subsets of variables. An application with a realdata set regarding the economic welfare of the Italian Regionsshows the benefits gained with the suggested procedure

    New weighed similarity indexes for market segmentation using categorical variables

    Get PDF
    In this paper we introduce new similarity indexes forcategorical data with nominal scale. In contrast to traditionallyused similarity measures, they also consider the frequency of themodalities of each attribute in the sample. This feature is usefulwhen dealing with rare categories, since it makes sense todifferently evaluate the pairwise presence of a rare category fromthe pairwise presence of a widespread one. We also propose aspecific weighted index for dependent categorical variables. Thesuitability of the proposed measures from a marketing researchperspective is shown using two real world data sets

    Assessing decoding ability: the role of speed and accuracy and a new composite indicator to measure decoding skill in elementary grades

    Get PDF
    Tools for assessing decoding skill in students attending elementary grades are of fundamental importance for guaranteeing an early identification of reading disabled students and reducing both the primary negative effects (on learning) and the secondary negative effects (on the development of the personality) of this disability. This article presents results obtained by administering existing standardized tests of reading and a new screening procedure to about 1,500 students in the elementary grades in Italy. It is found that variables measuring speed and accuracy in all administered reading tests are not Gaussian, and therefore the threshold values used for classifying a student as a normal decoder or as an impaired decoder must be estimated on the basis of the empirical distribution of these variables rather than by using the percentiles of the normal distribution. It is also found that the decoding speed and the decoding accuracy can be measured in either a 1-minute procedure or in much longer standardized tests. The screening procedure and the tests administered are found to be equivalent insofar as they carry the same information. Finally, it is found that speed and accuracy act as complementary effects in the measurement of decoding ability. On the basis of this last finding, the study introduces a new composite indicator aimed at determining the student's performance, which combines speed and accuracy in the measurement of decoding ability

    Searching for structure in air pollutants concentration measurements

    Get PDF
    When studying air pollution measurements at different sites in a spatial area, we may search for a typical pattern,common to all curves, describing the underlying air pollution process in a pre-specified period. Another area ofinterest to support local authorities in air quality management may be the classification of the different sites inhomogeneous clusters and the group ranking that follows. Yet, there is variation in both amplitude and dynamicsamong the air pollutant concentrations measured at the different monitoring stations. Analyzing such measurements,where the basic unit of information is the entire observed process rather than a string of numbers, involvesfinding the time shifts or the warping functions among curves. The analysis is much more complicated if weconsider a multivariate process, that is, vector-valued air pollutant measurements. Following our previous workwhere an improved dynamic time-warping algorithm has been developed, especially in the multivariate case, andused both for classifying functional data and estimating the structural mean of a sample of curves, we analyzed themeasurements of some air pollutants in Emilia Romagna (northern Italy). In addition, for the univariate analyses,we applied the self-modeling warping function approach, which is also convenient for these data. Indeed, thismethod was found to be model-free and enough flexible to capture very complex and highly non-linear patterns

    Some experimental results on the role of speed and accuracy of reading in psychometric tests.

    Get PDF
    According to the Italian Parliament act (n. 170/2010) that recognizesdyslexia as a physical disturbance, of neurobiological origin, dyslexic children in primary school should be early recognized, in order to asses a targeted intervention within the school and to start a teaching that respects the difficulties in learning to read, to write and to perform calculations. Screening procedures inside the primaryschools aimed at detecting children with difficulties in reading, are not so common in Italy as in other European countries. Nevertheless, screening procedures are of fundamental importance for guaranteeing an early detection of dyslexic children and reducing both the primary negative effects - on learning - and the secondary negative effects - on the development of the personality - of this disturbance. In thisstudy we analyze the validity, from a statistical point of view, of a screening procedure recently proposed in the psychometric literature (Stella et al., 2011). This procedure is very fast (it is exactly one minute long), simple, cheap and can be dispensed by teachers without psychometric experience. On the contrary, the currentlyused tests are much longer and must be provided by skilled teachers. These two major flaw prevent the widespread use of these tests. If the new procedure is found to be reliable, it can be provided to each student in primary school and it can also be repeated in time, in order to monitor the children difficulties. The validity of the procedure and the benchmark with two currently used tests are studied on the thebasis of the results of a survey on about 1500 students attending primary school
    • …
    corecore